69 research outputs found
Dissecting Arbitrary-scale Super-resolution Capability from Pre-trained Diffusion Generative Models
Diffusion-based Generative Models (DGMs) have achieved unparalleled
performance in synthesizing high-quality visual content, opening up the
opportunity to improve image super-resolution (SR) tasks. Recent solutions for
these tasks often train architecture-specific DGMs from scratch, or require
iterative fine-tuning and distillation on pre-trained DGMs, both of which take
considerable time and hardware investments. More seriously, since the DGMs are
established with a discrete pre-defined upsampling scale, they cannot well
match the emerging requirements of arbitrary-scale super-resolution (ASSR),
where a unified model adapts to arbitrary upsampling scales, instead of
preparing a series of distinct models for each case. These limitations beg an
intriguing question: can we identify the ASSR capability of existing
pre-trained DGMs without the need for distillation or fine-tuning? In this
paper, we take a step towards resolving this matter by proposing Diff-SR, a
first ASSR attempt based solely on pre-trained DGMs, without additional
training efforts. It is motivated by an exciting finding that a simple
methodology, which first injects a specific amount of noise into the
low-resolution images before invoking a DGM's backward diffusion process,
outperforms current leading solutions. The key insight is determining a
suitable amount of noise to inject, i.e., small amounts lead to poor low-level
fidelity, while over-large amounts degrade the high-level signature. Through a
finely-grained theoretical analysis, we propose the Perceptual Recoverable
Field (PRF), a metric that achieves the optimal trade-off between these two
factors. Extensive experiments verify the effectiveness, flexibility, and
adaptability of Diff-SR, demonstrating superior performance to state-of-the-art
solutions under diverse ASSR environments
Spatio-Temporal Calibration for Omni-Directional Vehicle-Mounted
We present a solution to the problem of spatio-temporal calibration for event
cameras mounted on an onmi-directional vehicle. Different from traditional
methods that typically determine the camera's pose with respect to the
vehicle's body frame using alignment of trajectories, our approach leverages
the kinematic correlation of two sets of linear velocity estimates from event
data and wheel odometers, respectively. The overall calibration task consists
of estimating the underlying temporal offset between the two heterogeneous
sensors, and furthermore, recovering the extrinsic rotation that defines the
linear relationship between the two sets of velocity estimates. The first
sub-problem is formulated as an optimization one, which looks for the optimal
temporal offset that maximizes a correlation measurement invariant to arbitrary
linear transformation. Once the temporal offset is compensated, the extrinsic
rotation can be worked out with an iterative closed-form solver that
incrementally registers associated linear velocity estimates. The proposed
algorithm is proved effective on both synthetic data and real data,
outperforming traditional methods based on alignment of trajectories
Research for Inertia Response and Primary Frequency Regulation Ability of Wind Turbine
[Introduction] Large-scale connection of wind power to the power grid poses great challenges to the stability (especially frequency stability) of grid operation.In order to solve the problem of inadequate frequency regulation capability caused by large-scale connection of wind power to the power grid and improve the frequency adaptability of wind power grid connection, wind turbines need to have frequency regulation function and response timeliness. [Method] This paper adopted a frequency regulation system scheme based on rotor kinetic energy and pitch angle reserve, which could provide active support for the power grid quickly and accurately during the power grid frequency change. Firstly, the main control algorithm was designed based on the theoretical analysis of inertia response and primary frequency regulation algorithm logic. Then, the functional verification was carried out on the co-simulation platform. Finally, the actual test was carried out in a project.[Result] The simulation and test results showed that the frequency regulation system scheme based on rotor kinetic energy and pitch angle reserve could cope with a variety of grid frequency changes and quickly provided active support. [Conclusion] The frequency regulation system scheme of wind turbines can perform a fast inertia response (with the response time less than 500 ms) and primary frequency regulation response (with the response time less than 5 s) under various frequency change conditions and provide active support for the power grid, which can help recover the grid frequency and effectively improve the frequency adaptability of wind turbines
Uremia toxin helps to induce inflammation in intestines by activating the ATM/NEMO/ NF-B signalling pathway in human intestinal epithelial cells
638-642During progressive chronic kidney disease, toxic substances known as uremic toxins accumulate in body fluids. Uremia toxin has been documented to be involved in most inflammatory reactions, and indoxyl-sulfate (IS) a major serum metabolite of uremia is a key player in this. The mechanism by which uremia toxin establishes it inflammatory activity is scarcely known; however, researchers believes that a clear understanding of this process can serve as a guide to combat the situation. The study was designed to investigate the role played by uremia toxin in intestinal inflammation. SW480 was used as cell lines for this study. Luciferase assay was used to detect the cell viability of different concentrations of IS. RT-qPCR was used to detect the effect of IS on the expression of inflammatory factors. The comet assay was used as a tool to detect DNA damage. Western blot was used to detect the phosphorylation level of ATM/NEMO/NF-kB protein. The IS of 0.09 nM was determined to be the best experimental concentration by luciferase assay. Result showed that IS promotes the expression of inflammatory factors TNF-α and IL-6. In addition, IS led to enhanced DNA damage in cells. IS promoted ATM phosphorylation leading to phosphorylation of NEMO to activate the NF-kB signalling pathway. In conclusion, uremia toxin facilitates inflammation in intestines by activating the ATM/NEMO/ NF-kB signalling pathway in human intestinal epithelial cells
Uremia toxin helps to induce inflammation in intestines by activating the ATM/NEMO/NF-kB signalling pathway in human intestinal epithelial cells
During progressive chronic kidney disease, toxic substances known as uremic toxins accumulate in body fluids. Uremia toxin has been documented to be involved in most inflammatory reactions, and indoxyl-sulfate (IS) a major serum metabolite of uremia is a key player in this. The mechanism by which uremia toxin establishes it inflammatory activity is scarcely known; however, researchers believes that a clear understanding of this process can serve as a guide to combat the situation. The study was designed to investigate the role played by uremia toxin in intestinal inflammation. SW480 was used as cell lines for this study. Luciferase assay was used to detect the cell viability of different concentrations of IS. RT-qPCR was used to detect the effect of IS on the expression of inflammatory factors. The comet assay was used as a tool to detect DNA damage. Western blot was used to detect the phosphorylation level of ATM/NEMO/NF-kB protein. The IS of 0.09 nM was determined to be the best experimental concentration by luciferase assay. Result showed that IS promotes the expression of inflammatory factors TNF-α and IL-6. In addition, IS led to enhanced DNA damage in cells. IS promoted ATM phosphorylation leading to phosphorylation of NEMO to activate the NF-kB signalling pathway. In conclusion, uremia toxin facilitates inflammation in intestines by activating the ATM/NEMO/ NF-kB signalling pathway in human intestinal epithelial cells
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic
lyrics transcription method achieving state-of-the-art performance on various
lyrics transcription datasets, even in challenging genres such as rock and
metal. Our novel, training-free approach utilizes Whisper, a weakly supervised
robust speech recognition model, and GPT-4, today's most performant chat-based
large language model. In the proposed method, Whisper functions as the "ear" by
transcribing the audio, while GPT-4 serves as the "brain," acting as an
annotator with a strong performance for contextualized output selection and
correction. Our experiments show that LyricWhiz significantly reduces Word
Error Rate compared to existing methods in English and can effectively
transcribe lyrics across multiple languages. Furthermore, we use LyricWhiz to
create the first publicly available, large-scale, multilingual lyrics
transcription dataset with a CC-BY-NC-SA copyright license, based on
MTG-Jamendo, and offer a human-annotated subset for noise level estimation and
evaluation. We anticipate that our proposed method and dataset will advance the
development of multilingual lyrics transcription, a challenging and emerging
task.Comment: 9 pages, 2 figures, 5 tables, accepted by ISMIR 202
On the Effectiveness of Speech Self-supervised Learning for Music
Self-supervised learning (SSL) has shown promising results in various speech
and natural language processing applications. However, its efficacy in music
information retrieval (MIR) still remains largely unexplored. While previous
SSL models pre-trained on music recordings may have been mostly closed-sourced,
recent speech models such as wav2vec2.0 have shown promise in music modelling.
Nevertheless, research exploring the effectiveness of applying speech SSL
models to music recordings has been limited. We explore the music adaption of
SSL with two distinctive speech-related models, data2vec1.0 and Hubert, and
refer to them as music2vec and musicHuBERT, respectively. We train SSL
models with 95M parameters under various pre-training configurations and
systematically evaluate the MIR task performances with 13 different MIR tasks.
Our findings suggest that training with music data can generally improve
performance on MIR tasks, even when models are trained using paradigms designed
for speech. However, we identify the limitations of such existing
speech-oriented designs, especially in modelling polyphonic information. Based
on the experimental results, empirical suggestions are also given for designing
future musical SSL strategies and paradigms
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Self-supervised learning (SSL) has recently emerged as a promising paradigm
for training generalisable models on large-scale data in the fields of vision,
text, and speech. Although SSL has been proven effective in speech and audio,
its application to music audio has yet to be thoroughly explored. This is
primarily due to the distinctive challenges associated with modelling musical
knowledge, particularly its tonal and pitched characteristics of music. To
address this research gap, we propose an acoustic Music undERstanding model
with large-scale self-supervised Training (MERT), which incorporates teacher
models to provide pseudo labels in the masked language modelling (MLM) style
acoustic pre-training. In our exploration, we identified a superior combination
of teacher models, which outperforms conventional speech and audio approaches
in terms of performance. This combination includes an acoustic teacher based on
Residual Vector Quantization - Variational AutoEncoder (RVQ-VAE) and a musical
teacher based on the Constant-Q Transform (CQT). These teachers effectively
guide our student model, a BERT-style transformer encoder, to better model
music audio. In addition, we introduce an in-batch noise mixture augmentation
to enhance the representation robustness. Furthermore, we explore a wide range
of settings to overcome the instability in acoustic language model
pre-training, which allows our designed paradigm to scale from 95M to 330M
parameters. Experimental results indicate that our model can generalise and
perform well on 14 music understanding tasks and attains state-of-the-art
(SOTA) overall scores. The code and models are online:
https://github.com/yizhilll/MERT
Robust Visual Compass Using Hybrid Features for Indoor Environments
Orientation estimation is a crucial part of robotics tasks such as motion control, autonomous navigation, and 3D mapping. In this paper, we propose a robust visual-based method to estimate robots’ drift-free orientation with RGB-D cameras. First, we detect and track hybrid features (i.e., plane, line, and point) from color and depth images, which provides reliable constraints even in uncharacteristic environments with low texture or no consistent lines. Then, we construct a cost function based on these features and, by minimizing this function, we obtain the accurate rotation matrix of each captured frame with respect to its reference keyframe. Furthermore, we present a vanishing direction-estimation method to extract the Manhattan World (MW) axes; by aligning the current MW axes with the global MW axes, we refine the aforementioned rotation matrix of each keyframe and achieve drift-free orientation. Experiments on public RGB-D datasets demonstrate the robustness and accuracy of the proposed algorithm for orientation estimation. In addition, we have applied our proposed visual compass to pose estimation, and the evaluation on public sequences shows improved accuracy
- …